Clustering Mixed Data via Latent Variable Models
نویسندگان
چکیده
A model based clustering procedure for data of mixed type, termed clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. The model employs a parsimonious covariance structure for the latent variables, leading to a suite of six clustering models that vary in complexity and provide an elegant and unified approach to clustering mixed data. An expectation maximisation (EM) algorithm is used to estimate the model; in the presence of nominal data a Monte Carlo EM algorithm is required. The clustMD model is illustrated by clustering prostate cancer patients, on whom measurements of mixed type have been recorded.
منابع مشابه
Model based clustering for mixed data: clustMD
Amodel based clustering procedure for data of mixed type, clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. clustMD employs a parsimonious covariance structure for the lat...
متن کاملClustering South African Households Based on Their Asset Status Using Latent Variable Models.
The Agincourt Health and Demographic Surveillance System has since 2001 conducted a biannual household asset survey in order to quantify household socio-economic status (SES) in a rural population living in northeast South Africa. The survey contains binary, ordinal and nominal items. In the absence of income or expenditure data, the SES landscape in the study population is explored and describ...
متن کاملBeta - Binomial and Ordinal Joint Model with Random Effects for Analyzing Mixed Longitudinal Responses
The analysis of discrete mixed responses is an important statistical issue in various sciences. Ordinal and overdispersed binomial variables are discrete. Overdispersed binomial data are a sum of correlated Bernoulli experiments with equal success probabilities. In this paper, a joint model with random effects is proposed for analyzing mixed overdispersed binomial and ordinal longitudinal respo...
متن کاملSpecifying Latent Structure Characteristics in Mixed-membership Models
Latent variable mixture models provide an important tool for the analysis of text and relational data. They encompass techniques like topic models for language modeling, and mixed-membership block models, which model relational data that are represented as graphs. A central characteristic of mixed-membership models, is their ability to uncover latent structure from large data in a fully unsuper...
متن کاملFrom Topic Models to Semi-supervised Learning: Biasing Mixed-Membership Models to Exploit Topic-Indicative Features in Entity Clustering
We present methods to introduce different forms of supervision into mixed-membership latent variable models. Firstly, we introduce a technique to bias the models to exploit topic-indicative features, i.e. features which are apriori known to be good indicators of the latent topics that generated them. Next, we present methods to modify the Gibbs sampler used for approximate inference in such mod...
متن کامل